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Automation has been a hot issue in constructions extraction, but there has not yet been a universally 
accepted algorithm. Commonly, constructions are extracted by user-defined thresholds, and they have 
to be adjusted with the variation of images and types of constructions. To overcome the shortages, an 
unsupervised algorithm to extract constructions is proposed in this paper. It adopts mean shift clustering 
in neutrosophic set domain to segment images, which makes it possible to detect constructions with a 
stable threshold. The algorithm is compared with three welcomed and recently developed supervised 
techniques by six study images with two sorts of resolutions. Experiments show that among the four 
algorithms, the method proposed in this paper performs best in constructions detection. It not only 
maintains the original shape of buildings, but also generates extracted constructions as a neat whole. 
Furthermore, the new method has stronger robustness when faced with images with different resolutions 
and imaging qualities. As tests show that the new algorithm can reach a kappa coefficient of 0.7704 
and an accuracy of 89.8054%, which are relatively high in constructions extraction, it can be a robust 
unsupervised technique to extract constructions. 

© 2013 Elsevier GmbH. All rights reserved. 



1. Introduction 

1.1. Constructions detection 

Constructions detection is playing a significant role in urban 
planning and monitoring development of an area. Apart from that, 
detecting constructions contributes to exploring problems of scene 
segmentation, 3D recovery, and shape descriptions in a rich, real- 
istic, and demanding environment [1 ]. 

For remotely sensed imaging, it is pretty difficult to extract con- 
structions because it is made up of pixels which only describe 
simple topological adjacency rather than real-world objects [2]. 
Segmentation is a way to turn numerous pixels into various mean- 
ingful objects with more informative attributes, such as shapes, 
length, textures and contexual information [3]. A good deal of 
algorithms segmenting images have been proposed and they can 
generally be grouped into three categories, pixel-based category, 
edge-based category and region-based category. For pixel-based 
method [4], it is the conceptually simplest way to segment images 
[5]. Pixels are divided into different groups by thresholding. Always, 
the thresholds have to be adjusted every time to meet demands. As 
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for edge-based approach [6,7], the most significant matter is edge 
detection. It segments images by edges detected, but the edges are 
fragmentized quite often. Therefore, edge linking also has to be dis- 
cussed after detection. Region-based algorithm [8] concentrates on 
region growing and region merging. It takes more details of real- 
world objects into consideration, but the edges of segmentation 
results are discontinuous, and the regions merged are dispersive 
rather than a whole. 

In this paper, we propose an unsupervised method synthesizing 
neutrosophic set and mean shift, and it is called NS-MS for short 
in this paper. It can not only detect constructions directly but also 
maintain the original shape of them. Mean shift clustering is done to 
the image which has been transformed to neutrosophic set domain. 
The segmented image can be used to extract constructions with 
spectral information unsupervisely rather than with textural and 
contour information supervisely like the one generated by region- 
based, edge-based or pixel-based technology. 



1.2. Neutrosophic set 

Neutrosophic set is a new concept in image segmentation. It is 
proposed by Smarandache [9] as extension of the fuzzy logic and 
has been used in philosophy, financial analysis [ 1 0,1 1 and semantic 
web services 12] widely. To our knowledge, neutrosophic set was 
first introduced to image processing by Guo and Cheng [21 ], and it 
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has been developing very fast in algorithms development in image 
processing, such as image segmentation [23], image classification 
[13] and image thresholding [34]. 

In neutrosophic set domain, there are three factors considered, 
indeterminate, true and false elements, rather than the two factors 
considered in fuzzy logic which include true and false ones only. 

According to Smarandache, neutrosophic set is a domain which 
image can be transformed to. Neutrosophic set can be expressed 
as N, and it contains three sub-sets: T, representing true set which 
comprises all the true elements; J, standing for indeterminate set 
which includes all the indeterminate elements and F, the false set, 
consisting of every false element. In general, N={(T, J, F): F, J, F 
[0,1]}, T={t : teT}, I={i: iel}, F={f:feF}. Based on the concept of 
neutrosophic set, we can judge the degree of a sentence p being true 
or not with the formulation v(p) = (t,if) [14] where t, i and/represent 
true degree, indeterminate degree and false degree respectively. 

With introduction of neutrosophic set to image processing, 
many effective algorithms have adopted the concept to get applied. 
M. Zhang and L. Zhang 15] have proposed an approach with neu- 
trosophic set based on watershed method. With their definitions of 
formulations mapped to neutrosophic domain, it gets stronger in 
resisting noise than the traditional pixel-based, edge-based, region- 
based and two watershed ones [15]. Cheng and Guo [16] improved 
a method to resist noise of an image with the neutrosophic set by 
a new filtering procedure to decrease the indeterminate degree 
which is expressed by entropy. Guo and Cheng [17] applied the 
theory into image segmentation with a clustering method. It has 
more stable and effective performance compared with the modified 
fuzzy C-means (MFCM) segmentation algorithm [18]. However, it 
can only deal with gray images and the parameters have to be 
defined manually, rather than automatically. To overcome that 
shortage, Sengur and Guo [19] applied the neutrosophic set theory 
into wavelet transformation theory, and it not only works automat- 
ically but also segments images into more intact details than new 
existing methods. 

1.3. Mean shift 

Mean shift is a nonparametric kernel density estimation tech- 
nique, and it is based on Parzen window method to find the 
maximum of kernel density [20]. Recent achievements in mean 
shift have made it increasingly popular in image segmentation and 
computer vision. Park et al. have proposed an algorithm which com- 
bines adaptive mean shift with statistical theory [21 ]. By implying 
statistics into mean shift, it automates to detect optimal cluster- 
ing number of mean shift, which frees mean shift clustering to be 
a ‘one-step’ algorithm. Dorin et al. have done much research in 
bandwidth selection and scale selection for mean shift 22,23 . Seg- 
mented results turn to be more continuous and real-world objects 
shaped than the images segmented by mean shift with fixed band- 
width and scale. 

This paper demonstrates how our segmentation algorithm 
works and the performance of it. It is organized as follows: in 
the next two sections, we introduce neutrosophic set and mean 
shift respectively. As for the fourth section, our algorithm NS-MS 
is introduced. Experiments and discussion are in the fifth section. 
Conclusions are presented in section six. 

2. Neutrosophic set 

Neutrosophic set consists of three components: true set, inde- 
terminate set and false set which are expressed by T, I and F 
separately. Moreover, T, I and F all belong to [0, 1]. The elements 
t, i and/are subsets of T, I and F respectively. In neutrosophic logic, 
we can describe a sentence with the formula v(p) = (t,if), which 



means the sentence is t percent true, i percent uncertain t and / 
percent false. A pixel P (i j) can be represented by PnsWJ)^ 
f[ij )) after transformed from color space domain to neutrosophic 
domain, where t(ij), i(ij),f(ij ) are the elements of T, I and F, respec- 
tively. 

2.1. Transformation 

According to neutrosophic theory that neutrosophic set is a 
combination of fuzzy logic and ‘Indefinite’ fuzzy logic [24,25], and 
the three factors are influenced by each other. We improve the part 
of transformation algorithm mentioned in Sengur and Guo [19], and 
they are defined as below: 
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g{ij) is the gray scale value of pixel P(ij), and g{i,j) is the local 
mean value of pixel P(i,j) when processed by a kernel with width 
of w. 



2.2. Enhancement operation 

When images are transformed to the neutrosophic set domain, it 
is divided into three sets, T, I and F. T set is what we need for further 
procession, but enhancement operation is necessary to enhance the 
differences among the values of elements in T. We adopt the idea 
put forward by Li et al. [27] 
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is the parameter self-determinated by entropy of the image. 
Since entropy is used to evaluate the distribution of pixels in the 
image, it is defined as below: 
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h w 

EnT = - yy t(u)\o g2 t(u) (i3) 

!=1 j=l 
h w 

EnF = ( 14 ) 

i=l j= 1 

Fn = Fn/ + EnT + EnF (15) 

Fn represents entropy of the image in the neutrosophic set 
domain, and it is the summary of entropies of T, I and F. Enl, EnT 
and EnF are the entropies of subset /, T and F respectively, and 

prij fri 

P = 0.99 - 0.99 x r (16) 

Fn max — Fn m j n 



En max - -log 2 (17) 

h and w represents height and width of the image. Enhancement 
operation will be kept doing until Enl changes little. 

3. Mean shift algorithm 

Given n random points X z , i = l, 2, . . ., n which are of d- 
dimensional Euclidean space, the multivariate kernel density 
estimate at point X can be defined as follows [17]: 

n 

fm=lyi< H (x-x t ) (is) 

r= 1 

where 

K h (X) = I H\-V 2 K(H-V 2 X) (19) 



^fh.K(X) = i-X)g 



1=1 



2Ck,d 
nh d+ 2 



Ij 

. i=i 



X -Xi 



£**(11^11’) 

1=1 

X>(ll¥l| 2 ) 

1=1 



(25) 



Here, mean shift vector is defined as the second term 
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Synthesize the six formulas from (21 ) to (26), mean shift vector 
can be reexpressed as 
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(27) 



It can be seen from (27) that mean shift vector of X obtained 
with kernel G is proportional to the normalized gradient of the ker- 
nel density yielded with kernel I<. m h G (X ) goes toward the trend 
of maximum increase of the density. The point with maximum 
increase is the one where Vf h K (X) = 0. Therefore, we get 
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and H is a symmetric positive-definite dxd matrix. The defi- 
nition of H has been discussed by Dorin [22]. K(X) is a symmetric 
kernel and it satisfies 



It is expressed as weighted average of yj, computed with kernel 
G.yi is the original position of kernel G. Based on (26) and (28), we 
can safely come to an iteration 



if(X) = c M k( ||X|| 2 ) (20) 

The normalized constant c kd is strictly positive, which makes 
K(X) integrate to one, while profile of kernel /<(x) can be tenable on 
condition that X> 0. To simplize the algorithm, H has been defined 
as H = h 2 I. I is an identity matrix and h is bandwidth, meanwhile, 
formula (18) can be rewritten as 



m h ,G(y j )=y j + i -yj ( 29 ) 

until 

m KG {yc) = yc-y c = o (30) 

and that is when the gradient of kernel density equals to zero,y c is 
the final result of mean shift procession. 
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The gradient of the kernel density estimate can be 
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To simplify the expressions of (21), two new functions are 
defined: 



g(X) = -K'(X) 



G(X) = c„ d g( X ) 



(23) 

(24) 



4. NS-MS algorithm 



Based on research process of Sengur and Guo [19] and the char- 
acteristics about L*u*v* ( L * means light intensity, u*and v* represent 
aberration separately) that it is better in separating different pix- 
els in accordance with spectral character. Secondly, decomposite 
image to three channels. Thirdly, the three channels are trans- 
formed to neutrosophic set domain respectively. Fourthly, obtain 
parameter by computing entropies of the image and enhance image 
in neutrosophic set domain. Later, merge true subsets of three chan- 
nels into one set. Then, mean shift operation is done on the merged 
true set and during the process of Mean Shift, kernel function is 
chosen to be 
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For formula (22), k'{X) exists in most cases when X> 0 [21 ]. As 
for (6), c gid is a normalized constant, then the gradient of the kernel 
density expressed by (21 ) can be rewritten as 



where P is color dimension of the image and spacial dimension is 
two. X s is a vector of spacial characteristic, and X r is color char- 
acteristic vector. h s and h r are the spatial bandwidth and color 
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Fig. 1 . Procedure of NS-MS algorithm. 



bandwidth respectively which need to be determined by user. Here, 
h s is assigned to be 20 while h r is 16. C is the normalized constant 

[26]. 

Finally, turn the image back to RGB color space. The whole pro- 
cess can be summarized in the following chart 

5. Experiments and discussion 

To the best of our knowledge, among numerous methods pro- 
posed 27-30] evaluating performance of image segmentation, 
there is still not an universally accepted algorithm. Most ideas pre- 
sented in image segmentation are assessed by comparing their 
performance in classification with some well-known methods or 
latest development in this area. 

In order to see how well NS-MS algorithm can work in con- 
structions detection, we contrast its performance not only with 
traditional mean shift method but also with two latest and most 
welcomed software packages in image segmentation. They are 
Berkeley ImageSeg (BIS: http://www.imageseg.com) and Envi- 
ronment for Visualizing Images Feature Extraction (ENVI-EX: 
http://www.exelisvis.com/). Both of them are object-based meth- 
ods to segment images into real-world segments. In object-based 
theory, every pixel is considered as a object and two operations are 
needed to segment image [31 ]. 

The first one is calculating difference between continuous 
objects, obtained by difference in spectral heterogeneity h p and 
difference in shape heterogeneity h t . 

h p = y W ,(n ah * uab - (n a a i a + n b a ub )) (32) 



where 0 < w* < 1 , ^pW; = 1 , and w z is the weight of band i, P is the 
number of bands of image, n represents the area of an object and 
(x z is the standard deviation of an object in band i. 

h t = w c h c + w s h s (33) 

^ _ ftgblgb _ tlglg _ tl/jl/j (34) 

c_ Yrh v^b 

i u ftgblgb Hglg tl/jl/j 

(35) 

and I is the perimeter of an object, b is the perimeter of an object’s 
minimum enclosing rectangle. w c + w s = 1 , 0 < w c , w s < 1 , and they 
are user-defined. 

The other one is merging. Considering the difference between 
regions calculated above, a synthetical criterion of differences can 
be generated as 

f = w h p + (l -w) h t (36) 

w is also assigned by user. If / is smaller than determinated 
merge-scale, object a and b can be merged into one object (Fig. 1 ). 

5.1. Data 

We use part of Xinjiang Province and Beijing as study area. 
For Xinjiang Province, four images of GeoEye are adopted to 
extract constructions (from Figs. 2-5). They were collected on 
May 24th, 2011. Constructions and roads mainly filled the images, 
and there is relatively large confusion among buildings, space 
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a. 1 Original image 



b. 1 Segmented by NS-MS 



a.2 Constructions detected by manul 




b.2 Constructions detected by NS-MS 





c.l Segmented by MS 



c.2 Constructions detected by MS 






d.l Segmented by BIS 




e. 1 Segmented by ENVI-EX 
Fig. 2. Comparison of segmented 



d.2 Constructions detected by BIS 




e.2 Constructions detected by ENVI-EX 
ts and extracted constructions. 



and roads in Fig. 2. With such obstacles, robustness of the com- 
pared algorithms can be obviously extracted. GeoEye now offers 
the highest-resolution and the most accurate unclassified Earth 
imagery for clear insight. The proposed method, Mean Shift (MS 
for short), Berkeley ImageSeg (BIS for short) and ENVI-EX are 
used to segment and extract constructions from the four study 
images one by one, and the results of one image by the four 
methods are listed in one figure. Segmented results by the four 
methods test are shown respectively in the left line and the 
corresponding extracted results are displayed in the right (see 
Figs. 2-7). 

With an intention to test whether the technique proposed in this 
paper can still work well in extracting constructions when faced 
with images of lower resolution and relatively poor quality, we use 
images of part of Beijing, collected from Resources Satellite number 
one 02C which was launched by China, to learn the pros and cons of 
the algorithm proposed in this paper. The images were recorded on 
March 8th, 2012 with a 2.36 m’ spatial resolution and mainly focus 
on constructions (Figs. 6 and 7). 




a. 1 Original image 




b.1 Segmented by NS-MS 




c.l Segmented by MS 




a.2 Constructions detected by manul 




b.2 Constructions detected by NS-MS 




c.2 Constructions detected by MS 




d.2 Constructions detected by BIS 




e.l Segmented by ENVI-EX e.2 Constructions detected by ENVI-EX 



Fig. 3. Comparison of segmented results and extracted constructions. 



5.2. Results and discussion 

5.2. 1 . Parameters determination 

Since green vegetation has specific spectral characteristics, we 
extract plants out of the six study images to simplify features of 
images. All the four techniques need parameters to run, and some 
of the parameters need to be adjusted to the variation of images, 
while some others are defined ‘one-off. All the parameters can be 
grouped into two parts, one is segmentation part, the other is part 
of extraction. 

In segmentation part, to objectively assess the robustness of NS- 
MS, MS, BIS and ENVI-EX, the same parameters are used for one 
method to segment six images. As for NS-MS and MS, bandwidth in 
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b. 1 Segmented by NS-MS 




c.l Segmented by MS 



b.2 Constructions detected by NS-MS 




c.2 Constructions detected by MS 





d. 1 Segmented by BIS 



e.l Segmented by ENVI-EX 



d.2 Constructions detected by BIS 



e.2 Constructions detected by ENVI-EX 



Fig. 4. Comparison of segmented results and extracted constructions. 



space h s and the one in color h r are the same during the process of 
Mean Shift, and they are 20 and 16 apart. BIS has default parame- 
ters whose segmentation performance can be competitive. For the 
two weights, w c and w are assigned both to be 0.5, and the merge 
scale is 50. Although the algorithm of ENVI-EX goes the same with 
BIS’s, it has no default values, and with only two parameters in seg- 
mentation called scale level and merge scale. According to ENVI-EX 
tutorial, a scale level of 30.0 can best delineate the tops of construc- 
tions and at the same time, it can maintain details of them. As for 
merge scale, 94 would be a good choice [33]. 

When it comes to constructions detection, we divide the four 
methods into two groups according to their segmented results. 

For group one, only NS-MS is included. From its segmentation 
results (Figs. 2b.l, 3b.l, 4b.l, 5b.l, 6b.l and 7b.l), we can see that 
pixels of one feature have similar spectral information and larger 
differences from that of other features. Owing to this, spectral infor- 
mation can be used to extract constructions from six images with 
a fixed threshold. 

The other three methods are grouped into another group. Their 
segmented results still preserve details of original image. When 
it comes to constructions extraction, characteristics of connexity 
regions can be helpful. But in order to guarantee the precision 
of extraction, supervised classification is adopted to extract con- 
structions from images segmented, except for ENVI-EX. ENVI-EX 




c.l Segmented by MS 



c.2 Constructions detected by MS 





d. 1 Segmented by BIS 



d.2 Constructions detected by BIS 





e.l Segmented by ENVI-EX e.2 Constructions detected by ENVT-EX 



Fig. 5. Comparison of segmented results and extracted constructions. 
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e. 1 Segmented by EN VI-EX e.2 Constructions detected by EN VI-EX 

Fig. 6. Comparison of segmented results and extracted constructions. 

is special for its design of software package, and several features 
of connexity regions (i.e., shape, texture, band ratio) are used to 
extract constructions out. On the basis of ENVI-EX Tutorial [32], 
four features can be used to extract buildings and rooftops out. The 
first one is band ratio, because normalized difference vegetation 
index (NDVI) of buildings and rooftops is next to zero. The second 
one is rectangle-fit, it represents how much the shape of build- 
ings and rooftops approximates a rectangle. Area is the third one, 
it is a separation of buildings from other industrial or other sorts of 
buildings. Finally is the band ratio, the rooftops’ color is always dark 



e.l Segmented by ENVI-EX e.2 Constructions detected by ENVI-EX 
Fig. 7. Comparison of segmented results and extracted constructions. 



a. 1 Original image 



a.2 Constructions detected by manul 



b. 1 Segmented by N S-MS 



b.2 Constructions detected by NS -MS 



c.l Segmented by MS 



c.2 Constructions detected by MS 



d.l Segmented by BIS 



d.2 Constructions delected by BIS 



a. 1 Original image 



a.2 Constructions detected by manul 



b. l Segmented by NS-MS 



b.2 Constructions detected by NS-MS 



c.l Segmented by MS 



c.2 Constructions detected by MS 



d. 1 Segmented by BIS 



d.2 Constructions detected by BIS 
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Table 1 

Statistical analysis of constructions extracted by four methods. 







Producer’s accuracy (%) 


User’s accuracy (%) 


Overall accuracy (%) 


Kappa 


Fig. 2 


NS-MS 


78.66 


72.93 


78.1418 


0.5588 




MS 


57.15 


65.27 


68.3200 


0.3449 




BIS 


84.71 


57.81 


66.6528 


0.3563 




ENVI-EX 


80.82 


65.73 


73.4835 


0.4742 


Fig. 3 


NS-MS 


89.15 


79.51 


89.8054 


0.7660 




MS 


87.01 


76.12 


87.8599 


0.7230 




BIS 


89.96 


67.98 


84.2021 


0.6564 




ENVI-EX 


73.84 


84.31 


87.9751 


0.7040 


Fig. 4 


NS-MS 


76.93 


87.91 


83.7949 


0.6740 




MS 


61.23 


63.15 


64.1237 


0.2807 




BIS 


92.66 


73.95 


80.7532 


0.6180 




ENVI-EX 


87.13 


71.70 


77.2414 


0.5477 


Fig. 5 


NS-MS 


89.15 


83.95 


88.8741 


0.7704 




MS 


59.38 


80.82 


78.1802 


0.5237 




BIS 


83.98 


71.00 


79.9314 


0.5940 




ENVI-EX 


82.09 


80.98 


85.1682 


0.6914 


Fig. 6 


NS-MS 


92.54 


66.11 


73.6326 


0.4797 




MS 


52.11 


48.13 


50.0273 


0.0021 




BIS 


93.14 


63.81 


71.3304 


0.4356 




ENVI-EX 


94.50 


59.75 


66.7784 


0.3489 


Fig. 7 


NS-MS 


81.42 


86.68 


81.7092 


0.6272 




MS 


53.10 


62.55 


53.6944 


0.0741 




BIS 


79.09 


88.15 


81.4376 


0.6251 




ENVI-EX 


92.36 


79.68 


81.6494 


0.6077 



and the spectral value in green band is relatively low. But the spe- 
cific values for each image have to be adjusted, according to human 
knowledge and reasoning about specific feature types. 

5 . 2 . 2 . Performance evaluation by vision 

Figs. 2a.2, 3a.2, 4a.2 and 5a.2 are generated by visual interpreta- 
tion as evaluation criterion of the four methods. Figs. 6a.2 and 7a.2 
are produced by supervised classification, because of low resolution 
of Resources Satellite number one 02C images, and we can hardly 
figure out exactly the outlines of constructions by naked eyes. We 
hold a detailed comparison with the evaluation criterion generated 
in each figure. 

The extracted constructions by NS-MS are more neat and 
the roads and space extracted by mistake are less than the 
other three methods. Furthermore, the blocks generated by 
NS-MS are more real-world objects shaped and smoothed. 
While the other three techniques perform relatively bad. MS 
extracts less information compared with other methods (see 
Figs. 2c.2, 3c.2, 4c.2, 5c.2, 6c.2 and 7c.2) and its results are badly 
influenced by shades (see Figs. 4c.2 and 5c.2). Some obvious 
buildings are missed (see Fig. 4c.2). Redundancy information is a 
mechanical damage for BIS (see Figs. 2d.2 and 3d.2) and the dis- 
tribution of constructions is not clear. Both BIS and ENVI-EX have 
recognized space and roads as constructions to a large extent (see 
Fig. 5d.2 and e.2). Moreover, extractions by ENVI-EX are fragmen- 
tary, they badly ruin the real features’ shape (see Fig. 3e.2). 

From segmentation results we can see that NS-MS is special 
(see Figs. 2b.l, 3b.l, 4b.l, 5b.l, 6b.l and 7b.l), it contains vari- 
ous blocks whose spectral information is quite different from each 
other and the pixels in one block share similar spectral charac- 
teristics, which lays solid foundation for detecting construction 
and extracting them from other features with the help of spectral 
thresholds. For the other three algorithms, their segmented images 
mainly maintain the spectral signature of original image. 

5 . 2 . 3 . Performance evaluation by statistics 

An accuracy evaluation is performed for the extraction results of 
each image in Table 1. It contains user’s accuracies and producer’s, 



overall accuracy and Kappa Coefficient of Agreement. Cohen’s 
Kappa coefficient is a statistical measure of inter-rater agreement or 
inter-annotator agreement [33] for qualitative (categorical) items. 
Kappa coefficient is more stable for the reason that it takes the 
factor when agreement occurs by accident into consideration. It 
measures the agreement between two raters, one is recognized as 
ground truth classification, while the other is the figure that needs 
to be evaluated. 

Kappa coefficient is defined below: 



K = 



Pl-P 2 

1-P 2 



(37) 




( 38 ) 



P 2 



bit 1 x bis! + blto x N s o 
NxN 



( 39 ) 



where N is the total number of pixels in each image, N s is the num- 
ber of pixels which are grouped into the same category in both 
images, one of which is considered as ground truth image while 
the other is the classification image, evaluated by the rater. The 
number of pixels of target object in reality is N n , and simulated 
pixels of target object is N sl . Accordingly, the number of pixels of 
non-target object in reality is N t0 , and the simulated is N s0 . 

Table 1 documents detailed accuracy assessment of the four 
methods (NS-MS, MS, BIS, ENVI-EX) extracting constructions, based 
on reference data in each figure. Kappa coefficient is grouped 
according to the theory proposed by Fleiss [35] that numerical area 
above 0.75 is regarded as perfect, from 0.40 to 0.75 is supposed to 
be good, and below 0.40 is poor. 

From the analysis of Kappa coefficients of every image in 
Table l,we can come to the conclusion that NS-MS shows the best 
performance with the highest kappa coefficient compared with the 
other three methods in extracting constructions from every image. 
And all of them are higher than 0.40, even higher than 0.75 which 
can be regarded as perfect. BIS and ENVI-EX perform neck and 
neck, both of them show bad performance in one image with a 
kappa coefficient less than 0.40. Interestingly, all of their kappa 
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coefficients are not higher than 0.75. MS performs badly in four 
images (see Figs. 2, 4, 6 and 7) with a kappa coefficient less than 
0.40. 

Overall accuracies share the same trend with Kappa coefficients. 
NS-MS can reach 89.8054% and it is averaged at 82.6597%. BIS and 
ENVI-EX share and share alike at about 78%. As for MS, its aver- 
age overall accuracy is 67.03%, indicating its bad performance in 
extracting constructions. 

Although there are two sorts of images with different spatial res- 
olutions, NS-MS insists on behaving well, extracting constructions 
with real-world object shaped and smoothed, while MS performs 
worse with lower overall accuracy and Kappa coefficient in both 
images (see Figs. 6c.2 and 7c.2). BIS and ENVI-EX are both influ- 
enced with resolution decreasing, and their performances are just 
about the same. 

All the four techniques show bad performance with low overall 
accuracies in Figs. 2 and 6 compared with other figures, one 
possible reason is that both the original images of Figs. 2 and 6 
have numerous blocks confused with roads and space, which 
are obstacles for constructions detection. However, NS-MS still 
performs very well dealing with these images, owing that NS-MS 
segments image in neutrosophic set domain, which can enlarge the 
difference among pixels’ spectral information of different category 
and smoothen the difference among pixels’ spectral characteristics 
of the same category. 

Compared with MS, BIS and ENVI-EX, NS-MS is a robust algo- 
rithm to segment image and do constructions extraction, regardless 
of the resolution of the image and the distribution of constructions 
and roads in the image. BIS and ENVI-EX perform relatively well, 
except for some conditions, i.e., objects are confused with each 
other to a large extent. MS can segment images well, but influ- 
enced by resolution of the image and the confusion between various 
objects when extracting constructions. 

6. Summary 

It is commonly believed that there are mainly two sorts of 
techniques to proceed after segmenting images in extracting 
constructions. One is supervised classification, and the other is 
extracting connected components based on geometrical charac- 
teristics and texture features, but the thresholds of each character 
can be one-off because of the various styles of buildings and vari- 
able study images. Faced with such problems, an unsupervised new 
algorithm which can extract buildings directly based on segmented 
images is proposed in this paper. It synthesizes neutrosophic set 
and mean shift to segment images, creating a new style of formats 
to display segmented results, and owing to this, constructions can 
be extracted by spectral information with a stable threshold. NS- 
MS has two key steps, one is transformation from color space to 
neutrosophic set domain and the other is mean shift segmentation 
in neutrosophic set domain. 

Compared with three commonly used methods, MS, BIS and 
ENVI-EX, there are six main characteristics of NS-MS in construc- 
tions detection. 

• Real-world object shaped and smoothed: constructions detected 
by this algorithm are real-world object shaped, and the pixels of 
one category are smoothed 

• Robust: images, whether with high quality or relatively low qual- 
ity, can be used to extract constructions with this method 

• Unsupervised: the technique can ‘one-step’ extract constructions 
without human intervention 

• One parameter: the parameter used in this algorithm does not 
vary with the images 

• Dependable: Roads and space which are frequently confused with 
constructions can be got rid of as well 



• Less redundant information: extracted constructions are neat 
and obvious, they are extracted as blocks with less trivial spots 
around. 
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